BQCOHEIX BlSaflS 



SD 137 343 



IM 006 14« 



AOIHOB 

lUBTlTUTlOn 

POB DlTl 
NOTE 



EDBS PBIGE 
DlSCBIPIOfiS 



IDENIJFIEES 



Hacreaayf George B. j Dayton, c* Mitcliell 
Statistical Comparisona &mong Hierarchies Based op 
Latent Structure Models, lasearch Monograph 77^1. 
Maryland Oniv*, College Park* Dept* of Measuremant 
and statistics. 
Apr 77 

2Sp-i Paper presented at tie innual Jjeeting of the 
imerican Educational Besearch Association (61at, Hew 
York, Hew York, April ^^8^ 1977) 

ME"$0.83 HC^i1,67 Plus Postage. 

♦Qocdneas of Pit| ^Hypothasis Tasting i ^Mathematical 
Modelsi ProiaMlityi Bole Conflict; standard irror of 
Measurefflent|f *statistical Analyst Tests o£ • 
Significance; True Scores 

Dofflain Bafarenced Tests; ^Latent Strncture 
Analysis 



ABSTBACT 

A probabilistic hypothesis testing procedure to 
assess the fit of hypothesized hierarchical structures for test item 
data is discussed, statistical procedures are presen^d which are 
useful for evaluating the fit of data of a certain class of 
probabilistic models. These models apply to sets of dichotomous (0,1) 
responses for which there are posited to exist a priori" dependence 
structures. Examples of relevant types of data are success/failurf 
patterns from Piagetian tasks, learning hierarchies, and domain 
referenced tests, as , well as agree/disagree.jc'esponses from attitude 
tests. (Author/BC) 



4i ^ ill ^ jfc ^( 3(1 4e 1^ J|E lie ^ ^ lie 3it 3|i If! ^ ^ III 1^ 

^ Documents acguired by EBIC include many informal unpublished * 

* materials not ayailable from other sources. lElC makes every effort ^ 

* to obtain the best copy available. Nevertheless, items of marginal * 

* reproducibility ure often encountered and this affects the guality * 

* of the microfiche and hardcopy reproductions EBIC makes available * 

* via the lElC Document Beproduction Service (EDBS) . IDES l ^r not * 

* responsible for the guality of the original document r Beproductions * 

* supplied by EDBS are the best that can be made from the original. * 

* « lii # 4e * « « 3ie sic 9^ 4e III 4^ 4 « # 9^ 



^1 



i 



^1 



find 



STATISTICS 



WUESS OF soUCATION 



ERIC 



Research Monograph 77-1 



STATISTICAL COMPARISONS A>IONG HIERARCHIES 
MSED ON lATEOT STRtJCTURE MODELS 

George B. Macready 
C. MitehaU ^B^oa^x 



April 1977 



DeparlTnent of Measurement and Statistics 
CoUGge of Educ'^tion 
University of Ma^land 



0> 

ERIC 



3 



I, THEORY 

IirfcrQduetio'n 

The purpose of this paper is to present statlsticaJ, procedures which are 
usefiul for evaluating the fit to data of a certain class of probabilistic models* 
Ri^e modaLs apply to sets of dichotomous (0^1) responsGs for which there are 
posited to ^Ist a priori dependency structares, ^mples of rrievant types of 
data are success/ failm^e patterns from Piagetian taskSj learning hierarchies ^ 
mnd domain^ referenced tests 5 as ^ell as agree/disagree responses from attitude 
instruments. 
Suiroaxy of the Model 

Using the notation from Da3rton and Macready (1976)5 where the model is 
developed in more dgtailj we assmne K distinct tasks each of which om be 
scored 0^1 for a seunple of n individuals. Corresponding to an a priori dependency 
structure^ a hypothetical set of O5I response patterns (true score patterns) exists 
which would typify m ^'ideal" group of respondents (i^e*^ a p-oup matching the 
latent steucture). We let ' 

(1) . - p(ua) ^ 2 »Cusb3)tij - : 

3^1 

be the probability of an obsewed response vector^ Ug j, where there are ij 

hypothetical true score patterns^ Vj ^ 3 ^ l^^.p^qi, with relative ftequencles of 

occurrence^ ( E © - l). The conditional probabilities ^ ^(^B Vj) Me 

'^recruilTnent'^ probabilities which connect the obs^ved response patterns to the 

true scccre patterns* The general clsu^s of recruitment probabilities which are 

of interest tak.^ on the formi 

K 



i^l 



The paraneters and 8^ are "error" probabilities which iire inter^etable^ 
respeGtivriy^ as "intrusion^* (epg. ^ guessing) and "omission" (e.g.. ^ forgetting) 
ezror rates^ while the coeffiLcients^ ay through dij ^e 0 or 1 and bxb chosen 
to correspond to the partieular Ug and vectors involved « 



For example I with K = 3 and an a priori Guttman scales the 4 true score 
patterns Kould be vi - (0 0 0)* , ^ ^ (1 0 0)% V3 =^ (1 1 0)% and V4 ^ (1-1 1)* 
and the diff^ent possible observed vectors^ Ug 5 are the 2® - 8 ordered sets of 
O^s and Vbi (0 0 0)^i (1 0 0)^i (0 1 oy i (1 1 0)^j (0 0 (10 1)^} (0 1 1) 

(1 1 1)'* Each of the 8 possible observed vectors may arise from a^ one of the 
4 true score patterns suitable choices for the coefficients ay through d^j. 
For simplicity 3 let a. ^ 1 - and =s 1 6^ j then^ the recruitment 
probEbilltitSj V(n^lv^)Brm% ^ 



Observed 
Pattern 








Itue Score 


Patterns 






0 


0 


0 


1 0 


0 


1 1 


0 


1 1 


1 


0 0 0 


ai 




03 




03 


Si 82 


03 


01 02 


03 


10 0 


aj 


02 


ea 


f 1 az 


«3 




as 


II B2 


03 


010 


ai 


az 


03 


01 02 


03 


H H 


03 


01 Iz 


03 


110 


«! 


02 


03 


li oz 


03 


01 02 


»3 


II 02 


63 


0 0 1 


ai 


02 


03 


01 a'2 


03 


01 02 


03 


01 0z 


83 


10 1 


«1 


02 


aj 


Fl az 


as 


02 


03 


01 02 


03 


0 1 1 


ai 


az 


03 


Bl az 


03 


01 0"z 


03 


01 Iz 


03 


111 






03 


f 1 "2 


03 


01 Iz 


03 


?i fa 


03 



The total probability for a given observed pattern is the weighted sum of the 
appropriate recruitment probabilities using the weights^ Qj_ Eng. 3 
P(U3 ^(0 1 0)0 ^ ©1310^203 + ©zSl.aaas + ^%^iH^% + 04818263 
Estimation Procedures 

— — ir— 

For n = E ng respondents 5 the likelihood for the sample i§ ^ 

s=W- ■ • 

(3) L = IT P(us) ■ . 

s=l ■' 

'where Ug is an observed Ojl vector given by ng respondents. With q a priori 
true score patterns, there are 2K + q - 1 independent parameters to estimate 
and problems arise firora 2 soureesi (1) the parameters may be non-identifiablej 



■ , 3" 

"(II) the set of partiaa derivatives of I» with respect to the 0j* -j ^ and 3^, 

(the normal equations) are, in genoral, non-linear in tiie paraneters. 

(I) Non-IdentMiable Models - consistent estimates will not "be available 
In this circumstance and tljfe model must ba restricted suitably to permit 
estimation. For example, with true, score patterns of = (0 0 ... 0)' and 

V2 « (1 1 1)' onlys the modal is identiflable so long as K ^3. However, 
Kith true seore patterns ^piQ^lng a linear hierarchy^ the model is not 
iaentiflable} the restrictions = a and ^ 8 do result In an identifiable mode 
so long as K i 4. 

(II) Non- Linear Normal Equationa - sinee the data inay be represented as 

frequeneies of occurrencej n^ ^ for the 2K possihle Ojl outcome patterns^ an 

iterative ma^cimum likelihood (ML) estimation scheme (Fisher* s method of scoring) 

can be employed (RaOj 196S)* Computer progrmis i^ltten in FORTRM IV have been 

developed aromid this iterative ML algorithm for the following cases^: 

MODELS - taie score patterns are (0 0 *** 0) and (11 1) only:^ 
m a ^ Si ^ 0 is assumed (this model is^ simply^ a 
mixture of t^^o binomials)* 

MODEMG - true score patterns are (0 0 0) and (11 l) oiJyi 
and 6^ are estimated per task (item), 

MODELS - true score patterns may be any llnew^ or branching 

hler^chyi a and ^ 3 are assumed (optionally^ 

- 0^ ^ a can be imposed as a fiirth^ restriGtion), ^ 

Assessing Fit of the Model 

Standard (Pearson) chi-square goodness-of-ftt tests can be utilized by 

comprting '^expected*^ flrequencies for each of the 2K possible response pattens* 
.. * . 

Let 

q . 

(4) P(Ue) - 2 P(u3lvj)^0j 

be found by substituting ML estimates in (1) and (2)* Then^ the expected 

^Single copies of progrmn listings and a user*s manual available by witlng 
the authors at Department of Measurement & Statistics^ College of Education^ 
University of Maryland ^ College Park^ Md» 20742. 



feequencies are given by 

(5) ns ^ n-^(Bs) 

and the Pearson chi-square statistic is 

(6) Cp ^ ^ [(ng ^ ns)Vnsl 

\*ich ean be evaluated (for "lepcge^^ n) as chi-square with degrees of freedom 

equal to 2K ^ ^ l^ wh^e q^ is the nmnber of independent parameters estimated 

imder the mod^ (e^g./ q* ^ 3 in MODEIBj q' ^ 2K + 1 In M0DEI5G} q*^ q + 1 in 

MODELS). ^ 

As an alternative to the Pears on cfii-square statistl(3s the fitted model 

can be eompared to the best-fttting multinomial densi^ a likelihood ratio 

test. The estimators for pCus) mider tiie multinomial model are ng/n and the 

likelihood ratio is 

2^ n* 

(7) X m If [PCu^yCns/n) ] 

where P(ug) Is as defined in (4), For "l^ge" n^ cj. = -21oggX is a ehi-square 
statistic with 2K q* - 1 degrees of freedan (the Peacson and likelihood ratio 
statistics are asymptotically equivalent). 

Comparisons among Models , 

Two different ferms of the probabilistic model In (1) md (2) omi be 
compared on the same set of data if one model can be derived fcom the other by 
imposing lineau: restrictions on the parameters. For i^xamplej MODELS can be 
derived feom M0DEL3G by setting ^ a andS^ ^ SS^thuSs tlie relative fits of these 
two models can be comp^ed. Similarly 3 there is great flexibiliiy in compMing 
different hierarchic structirces under MODELS. Models related in the above 
mamier are described as exhibiting "subset Inclusion" among the parameters, 
at: extentlon of the likelihood ratio test for fit in (7)^ wo can compare models 
ob^ing subset inclusion. Assume that the more complex model is based on 



7 



B 

fitting p^ameterSi while the less complec model involves r2 < paramet®fs| 
that is^ r^^ — restrictions have been imposed when deriving the second model 
firom the first model. Let and X2 be likelihood ratios derived as in (7) for 

; the respective models, Than^ - "-SlogeC Ag/Xj) = C2 - ci is a chi«square 
statistic with r^^ - rg degrees of fceedom and this statistic provides a basis for 

: deciding whether or not the more restricted form of the model is a poorer fit to 

: the data than the more con^lex model. 
Cross-validation of Models 

-The same form of the probabilistic model (e.g. p lineso* hierarchy) m^ be 
posited for samples which differ systematically (e.g.^ males and females)* 
Although some general procedures can be used to compare the consistency of 
observed feequenoies in different samples (chi-- square or KoJjnogoroV'^Smirnov 
statistics) 5 the comparablli^ of parameter estimates can be assessed by a double 
cross-validation technique. That Ig^ pm^ametffl' estimates for the relevant 
pacmneters (O-i ^ a- 5 etc*) are derived fTom each sample separately and then 
fitted to fti^equencies feom the other samjple, With appropriate modlfleations to 
degrees of fceedom (2^-1 rather than 2^ - q^ - 1)5 the cross-validation chi- 
squares for goodness--of--fit provide ©vidence for the coneistency of parameter 
estimates across samples* 
Significance Tests for Parameters 

Inter-sample ai^ intra-sample aigniflcance tests mm available for individual 
pM-EUTieters (assuming large samples since the Interative ML estimation procedure 
yields asymptotic sampling vacianc&-eovarlmiee estajnates*' Appropriate Intra-sample 
hypotheses are ©j ^ Dj ^ ci^ ^ ^ or 0^ - 5. where Dj^ Aj^ and Bi are ordinarily 
0^ and the test statistics are 

.(8) ^^ (^3- %)/% ^ C Ai)/%^ ^ etc. 

If several such tests are conducted for the samG set of data and simultanGous 
confa'ol. of 'the Type I error rate is desiredV the Bonferronl (Flsher-Dunn) approach 



ERIC 



8 



u 

is generaUy appropriate mi involves merely the setting of the significauice 
level per test at 1/ni of the total desired Type I ^ror rate (where m Is the 
number of statistical tests being conduGted), In addition^ it is possible to 
test hypotheses based on alleged rQlationships among subsets of the paraTieters. 
A connnon example involves MODELS where the equall'ty of intrusion and omission 
CTror rates wuld imply the hypothesis a = 6, Sinee "l^ge" sample estimates 
of the relevant varianoes and covariances are available^ the test can be set 
up as 

Similarly 5 inter--sample tests for hypotheses such as - ci^j 
of the form 

(10) . = - )/^^rT~r^^^ , 

where the sample is referenced by the second subscript (e*gs3 a^j^ is the estimate 
for the "guessing" error rate on task i in sMple 1)* 



9 



II, APPLICATIONS OF MODELS 

In this section of the poper two sets of data used as the tasis of 
separate analyses in order to provide samples of n variety of analytic procedures 
thrt can be applied within the contesct of the models. 4 
Role ConflLict Exampl e 

TtiB first example is based on the data from a study by StouW^ and Toby 
(1951) dealing with J.ndividuals role confflLict In determining "the proper thing 
to do" In a morally confflicting situation involving convicts between obligations 
to a ftiend and more general social obligations* - 

The3^ data are based on two forms of a four item questiormalre both of ^fhieh 
ware eompleted by 216 randomly assigned undergraduate students. 

For form I (Ego faces dilenma) of the questionnaires the respondent was 

faced with the follo^ring role conQictsf 

1* You are riding in a cm* driven by a close feiend^ and he hits a 
a pedestrian* You know he \mm going at least 35 miles an home in a 
2 0-mile- an--hoiuc speed zone* There are no oth^ witnesses* His la^^er 
says that if you testify undm' oath that the speed was only 20 miles 
BXi hour^ it may save him from serious consequences, Mirtiat do'you 
think youM probably do in view of the obligations of a sworn witness 
and the obligation to your friend? 
Cheek one: 

Testify that he was going 20 miles an hour 

Not testify that he was going 20 miles an hour. 

2* You a New York drama critic, A close friend of yours has 
sunk all his savings in a new teoadway play« You really think the 
play is no good. Would you go easy on his play in your review in 
view of your obligations to your readers and your obligation to 
• your friend? 

Check one: 
Yes 
No 

3* You are a doctor for an insurance company. You examine a 
close friend who needs more insm'TOce, You find that he is in 
pretty good shape^ but your are doubtful on one or two minor points 
which are difftcult to diagnose. Would you shade the doubts in 
his favor In view of your obJlgations to the insi^ance company 
and your obligation to your friend? 
Check onei 

Yes- " '■■ ^ -■- 

No ■ " ■ ■ ' 



4, Yoii have just come from a secret meeting j>f the board of 
. ' directors of a compary. You hsi^ a close ffiend^^^^ . - 

ruined unless he can get out of the market before the board^s 
dTCision becomes knov^;n. You happen to be having dinner at that 
friend home this sane evening. Would you tip him off in view 
of your obligations to the company and your obligation to yoi^ 
friend? 

Check one % 

Yes - - ■ ' ■ . ■ 

m ' ■ ■ ^ - ■ - -. ■ : 

While for form II (Rriend faces dileirana) of the questionnaire the stories were 

rewritten so that a friend of the respondent was faced mth the sraie dilerranas* 

On the basis of a Guttman scalQ^am analysis^ouffer & Toby (1951) poslte 

that there may be a lineM* scale underlying their Inrtrimenfc* They state: 

This fusion of variables in our situation does seem to generate 
a unidimensional scale ^ the dimension involved being the degree 
of strength of a latent tendency to be loyal to a friend even 
at the cost of other principles* The raiA groupings would 
V represent ordered degrees of probabili^ of taking the friend -s 
side in a role conflict p* 400^ 

The Guttman scalogram analysis 5 for both questionnaires resulted in the following 

order of Items: 4^ 3^ 2^ 1 where all preceding items are considered to be 

conditional prerequisites for responding positively to an item* This ordering 

resulted in reproducibility coefficients of *92 md ^91p respeetivelyj for 

foms I and II. These values are both iwger than the minimally sufficient value 

of ,90 suggested by Guttman as necessary for a lineM" scale (Torgerson^ 19S8). 

However^ as Stouffer & Toby point out| there ^e two response patterns (1 1 0 1 

and 10 10 for items 4 through l^ respectively^ where a '^1" indicates a yes 

response to an item and a "0" indicates a no respdnse to an item) with relatively 

high frequencies of occurrence which are not compatible mth the linear scale 

(see Tables I & II). 

If these two response patterns are added as "true score" response patterns 

to those true score patterns for a linear scale (see footnotes to Tables I and II) 

a resulting "branching hieracchy" (as described in Macreadyp 1975) is obtained in 

which the same conditional relations are presetrt as for the posited linear sfca^^ 



9 



Table I 

Response Etequencies and Tests of Model Fit for Role Conflict Data-Poi'm I 



Response 
Patterns 



Items: 4 3 2 1 



0 
1 
0 
0 
0 
1 
1 
1 
0 
0 
0 
1 
1 
1 
0 

1 



0 
0 

1 
0 
1 

1 . 
Oa^b 

it 
1 

ia?b 



Observed 
Frequencies 



42 
23 
6 

6 
1 

24 

25 
4 
7 
2 
1 

88 
9 
6 
2 

20 



Bcpeeted 
FrequenciaB 



Linear 
Model 



41,057 

24,102 
8,248 
5.640 
1.899 

22.169 

14.rL7 
2.567 

13.608 
2.058 
1.974 

41.909 
6,249 
S.990 
5.974 

18.441 



Branching 
Hierarchy 



41.400 
23.443 
5.903 
6.138 
1.742 
24.002 
25.214 
• 2.S6S 
7.33S 
1,973 
.899 
37.573 
9.947 
4.417 
3.813 
19.637 



Reproducibility Coefficient 

Chi Square Tests 
Goodness of Fit . 
Difference in Fit 
Degrees of I^eedom 
B-Value 



.92 



18.5657 2.7684 
15.7973 
9 2 7 

.029 .000 , .906 



^rue score response patterns for the linear scale model. 

^True score response patterns for the branching hierarchy model. 




: lo;: 



Table.; 11^;;'^ _ ■ ■ 1 

Response Frequencies and Tests of Model Fit for Role Conflict Data- Form II 

Response Observed Bcpected 

~ Prttems Erequericies - Frequenciei ; ■ " 

Linear teanching 
Items: 4 3 2 1 Model Hierarchy 



0 0 0 0*»^ 


• 37 


29.783 


37.028 


10 0 0^»^ 


31 


31.679 


30.688 


0 10 0 


s ■ 


11.321 


6.810 


0 0 10 


6 


6.629 


4.520 


0 0 0 1 


2 


5.948 


. . 2.270 


11 0 OSst 


, 29 


^ 32.895 


27.570 


1 0 1 


15 


10.209 


16.281 


10 0 1 


4- 


6.918 


5.111 


0 110 


6 


6.251 


5.256 


0 1 0 1 


3 


2.960 


4.248 


0 0 11 


3 


2.048 


.967 


111 oajD 


• 25 


25.975' 


25.673 


11 0 I'b 


23 


10.064 


20,701 


10 11 


4 


5.658 


4.530 


oil 1 


3 


4.884 


. 4.094 


111 1^»° 


20 


22.783 


20; 173 



Reproducibilitj^ Coefficient 



.91 



Chi Squire Tests 
Goodness of Fit 
Difference in Fit 
Degress of Ei'eedom 
P-Value 



27,9201 5.3686 
22,5515 
9 2 7 

.001 *000 .615 



siTrue score response patterns for the linear scale model, - ; 
bTrue score response patterns for the branching hierarchy models 



except that: , - 

(a) item 3 is not a conditional prerequisite for item 2^ and 

(b) item 2 is not a conditional prerequisite for itein 1. 

If it is assumed that a - a and B for t^(1^2^3,^) then both the linear 

model and the branehing hierarchy described above are special cases for Model S 

described in part I of this paper. The resulting expected frequencies for the : 

responee patterns (based on Maximum Likelihood parameter estimates) under each 

of tte above models are presented in Tables I & II ^ for questionnaire forms I & 

II respectively. Note that the accuracy of the estimated feequencies for the . 

branching hierarchy in most cases provides closer approximations to the observed 

firequencies than those obtained under the linear model* As might be.expectedj 

chi-square tests of fit for each test form with level of significance set at * 05 

resulted in "acceptable ftt" only for the branching hierarchy* In addition^ the 

branching hierarchy model was found to provide „signiftcantly better fit than , 

was provided by the linear model (see Tables I & II). 

Equali^ between corresponding par amrters und^ the branching hierarchy 
. » . . . - ' • • • 

model for the two-questionnaire foMis were simultaneously tested via a double 

cross-validation procedure. This malysiSj the restits of which are presented 

in Table III^ led to the reject ion of the hypothesis of equali-ty when a .05 level 

o^ slgniftcance was used. This ^is . supportive evidence...for^ separate post ^^^^h^ 

parisons testing equalily of vriues for each pauE-ajneter found under the ta^o forms* 

Table III 

Double Cross-validation foi the Hierarchic Model across Forms of Questionnaire - 

■■ ■- ■ - - - ■ - ■ ■ ^ .> ^ - ■ ■ 

Questionnaire Form Chi" Square df P-^Value 

Parameter Fitted 

estimates drta - ' . ' - - ■ ■ ■ 



A - B - - 30.600 _ 15 . .010 

B A : 26.305 15 ,08S 



Table IV - 

M^nmra likelihood Parametir Estimites aid their Standard toors for Roll Conflict Data 



form I 



Form II 



True Score 




Linear 
Model 


Hierarchic 
Model 


Linear 

Model 


■ ■ r 


Hierarchic 
Mode! 


, . ■ , , . . , 


(^oipirisons for Hierarchic 
Model - • ■ 


- ■ Pattirns 




■ i scorei 


^ - 1 * * 

2-tailid 
P-ValusB - 


Itf,iii8: 4 3 2 1 


Pai'amatir Sti 
est, teor 


Pcffamiter 
est. 


Std. 

%mt 


est. 


Btfl. 
Error 


Pwametir 
est. 


Std. 

error 


0 0 0 0 ' 


.1?6S 


.041 


.1961 


.039 


,22S1 


.062 . 


.1679 


.042 


.49 


.62 


lO'O 0 


aOBD 


.036 


,0071 


,031 


.2019 


,0S2 


,1376 


.037 


/ -1.03 


,30 


110 0 


.0703 


,040 


.1072 


,034 


.2274 


.052 


.1331 


.037. 




.60 


1110 


.3995 


,058 


.2678 


.046 


.1611 


.049 


,1724 


,040 


1.57' 


.12 


nil 


.2457 


.051 


.1722 


.043 


.1815 


.051 


,1809 


.047 


- .14 


,89 


1010 






.1236 


,034 






.0742 


.029 


. 1.32 


.26 


1101 






,0460 


.024 






.1339 


.033 


-2.15 


,03 , 



Response 
teors 

St 



.0311 
,2446 



,031 .0327 
,031 ,1625 



,029 
.034 



.1628 
.1714 



.039 
.045 



.0380 
.1686 



.037 
,036 



.11 

.13 



.91 

.90 



H - 



The maxiinum likolihood parmneter estimates their corresponding 
estimated standard errors obtained under ©aQh of the described models for 
forms I & II are presented in Table IV, Note that under the hierarchic models 
there are "moderate" (relative to the standard errors) differences found between 
the form estimates of corresponding proportions for some of the true score 
response patterns (namely 1 0 0 0^ 1 1 1 Oj 1 0 1 0 and 1 1 0 1), .However, a 
signif icant "dif f erenco between the estimated proportions oecured only in the 
case of response pattern 1 1 0 1, On the other hand ^ corresponding estimates 
of each of the error parameters show extremely smaU differences* These combined 
findings provide support for^the contention that differences that do ^ist for 
the two procedi^es of testing , do not aflPect " -OTor rates" but do affect the 
proportion of individuals foimd within each of the true score response patterns* 
The specific nature of this eff^t is, however^ at best vague* 

Based on the differences under the hierarchic model in the estimated true 
score proportions of Individuals who "sho^d'V respond positively to items 1 
through:45 for fomis I & II (these diff^ences are respectively .095 +*13^v 
^.03 and.-*P3) the following conjecture seems appropriate: individuals under 
questionnaire form I tend to produce more sinniltaneous positive responses on 
items 1 and 83 which are in one branch of the posited hierarchy 5 . wfele individuris 
under questionnaire form II tend to produce more positive responses to. item 2^ ^ 
whieh is at the end of other branch of the hierarchy* ' 

Domain Referenced Testing Example 

The second sample is based on the data from a study "fy Macready and Ds^ton 
(1976) in which the relations among items firom a single domain in a Domain ^^^^^^^^^^^^^ ^ ^^^^ 
Referenced Test^ (DRT)^ were investigated. This domain contains items involving 
integer raultiplication-in v^hichr (a) the-^^ digits; (b) the multiplicand- 

has either 3^6r^^ (c) tiigye is at least one '^c^ry" operation for each 

dlf iC in the multiplier. The specific data conS*4ct*ed is based on dicotomus item 



scores (i,e* 5 the scores 1 & 0 indicate respectively passing and failing the ■ 
item in question) otrfcained on 4 randomly selected items firom the specified 
domain for 284 feurfch gradti students. 

Macraady bP'I Merwin (1973) a;s well Harris (1974) have suggested that 
LidiG constamction and revision of item domains be based on the homogeneity of ; 
item content and the internal -eonsistency of exaninee^a item responses so that 
it is more reasonable to asiiMe that a specified individual either has acquired 
the necessa^ concepts and/or skiJJLs to respond correctly to (a) all items 
mthin the domain or (b) none of the items within the domain. If this kind of 
relation holds for the items within a dcmain then the only teue score response 
patterns are 0 0 0»**b ^d 1 1 1***1 (i*e.^ the only reason why a response pattern 
other than all 2eros or all ones occurs is due -to guessing or forgetting errors 
on one or more of the items). Thus Model 3G (in which the only true score 
patterns are [0 0 0*"0] and [1 1 1»**1] nnd for each item "1" and are 
resp^irtively guess and forget errors) and Model S (which is a special case of 
Model 3G in idiich ^ and ^ S for aU i) are appropriate models for the 
assessment of the nature and relations iunong items within* domains. 

In this DRT examples the item scores for 1^ randomly selected students, from 
the or iginsJ. sample of 284 w^e used to generate maximum likelihood estimates of 
the parameters and their standard ei^ors under both Models 3 ^d SG (presented in 
Table V). The data for the remaining Iffl students w^e used as a eross--validation 
sample, " 

Note that the estimated a 's 5 under both models^ me relatively small in 
magnitTide (^cept fi| ) when comp^ed to their standm'd errors and the corresponding 
l-vaiues. In feet a 2 is the only guess ezxor parameter that differs significantly 
feom^zero. This outcome was expected since the items ware presented in ftree 

response formats - - _ . _ _ 

On the bas^of the parmneter estimates presented in Table expected 



Table V 



Maximum Mkelihood Parameter Estimates Errors for DRT Data 



Model 3 G 



Model S 



Parameter 
0 



Estimated 
yalue 

.41 

.21 

.07 

.02 

.05 



Std. 

error 

.063 
.067 
.062 
,029 
.053 



Parameter 



Estimated 
value 

.40 

.08 



Std. 

error 

.068 
,036 



02 

83 
Pit 



.25 
,22 
.57 
.29 



.059 
.062 
.063 
.065 



,84 



.041 



19 




ERIC: 



16 



T&ble VI 



Responsa ^equencies tmd Tests of Model Pit for DRT Data 





Observed Iteq,, ' - 


Expected 


Freq, 


Response 


Validation 


Cross-val, 






Pattern 


sample 


sample ' * 


Model 3G 


Model 3 


0 0 0 0 


■ 42. ' : 


-\. 41 ■ -V 


41.04 


41.07 


10 0 0 


13 


12 


32.91 


S.95 


0 10 0 


6 


10 


5.62 


5,95 


0 0 1 0 


1 


3 


1.30 


S.95 


0 0 0 1 




3 


4.04 


S,95 


11 0 0 




8 


8.92 


4.68 


10 1 0 


3 . 


1 . 


•1.93 


4.68 


1 0 0 1 


6 


2 


6.13 


4.68 


0 11 0 


2 


2 


2.08 


4*68 


0 1 0 1 


S 


■ 5. 


6.61 


4,68 


0 0 1 1 


4 


X 


1.42 


4.68 


1 110 


7 


8 


6.19 


8.32 . 


1 1 0 1 


23 


-16 ■ 


19.74 


8.32 — 


10 11 


^ 1 


• \ A 


4.22 


8.32 


0 1 1 1 


4 


16 , 


4.90 


8.32 


1111 ■ 


IS 


20 


14.95 


IS. 82 




foGquGncies corresponding to each of the 16 possiblo response patterns v^?ere 
generated/ These eKpected&eq^^ along with the observed firequencies 

for both the validation and cross-validation samples are po'esanted in Table VI. 
These frequencies were used in the statistical assessment of at provided by 
the models; 

Table VII presents results of ehi-square tests used in assessing both 
absolute and relative fit provided by Models 3 and 3S. Chi-square results 
y elated to irodel validation and cross-validation si^gest reasonable absolute 
£i± only forModel 3G, 

Tlie chi-square test related to relative fit provided by the two models 
resulted in significantly better fit for Model 3G* This may be interpreted as 
evidence supportive of the contention that* a^t ^ and/or f$ for all 1 values^ 
The large estimated value f^r 83 appears to be a logically unreasonable estimatep 
this suggests the possible need for subdividing or othermse restructaring this , 
domain. : 

Note that it may be desirable to classify ^mnlnees obtaining each response ^ 
pattern '-J" in such a way that misclassification of the -hJo true score "t^pes" 
(0 0 0 0 and 1111 which could be dubbed respectively --non-masters" md - 
-^masters") is minimized. 

Given that the models ^e "adequate" representations of the bdiavior being 
assessedj placement may be implemented by comparing the relative magnitudes 
o£ the estimated Joint proportions for each response pattern ^y" with each tme 
score type (i.e. j P(jn 0000) and PCj Aim) which are presented in Table VIII 
and classl^ing examinees obtaining response pattern "3" as^ ^ : 

(a) "masters" if PC JAOOOO) £ P(oAli21) 

(b) "non^masters" if P(aAOOOO) >^ (jAmi). : 
^ ^^^r - - for Model SGj the response patterns designating "non- 
mast^" status are! 0 0 0 0^ 100 0^ 010 Oj 0010 and 0 0 0 1 which results 
in an expected proportion of mlsclassifled examinees of * 0703, For Model 3 ^ 



18 



ERIC 



: ' Table VII 
Statistical Tests of Model Fit for DRT Data 



Assessment _ Model 3G Model 3 

Model Velldrtion 



Chi-'Square 9,459 51.758 

Degrees o£ Freedom . 6 12 

P-Value .149 ,000 

. - ^ ... . . . ......... ^ _ 

Model 

Cros8--Valldation3' 

Chi^Square 12.997 34.173 

Degrees of Freedom IS 15 

rvalue .603 .003^ 

Con^arison of 
Models - 

Chi-Square 52.643 
Degrees of Freedom 6 
P-Value . ,000 



^Model cross-validation was based on ftt provided by 
tiie original expected feequencies to the observed 
feequencies obtained from the Iffi students not used 
in parameter estimation. 



22 



Table VIII 



Estimated Joint Proportions o£ Response Patterns and Mast afy States for DRT Data 



Model 3Q Model 3 



Response 
Palrtorn 


P(n -0000) 


^^Kf • f\ T T 1 T \ 

P(n^ini) 


PC^AOOOO) 




0 0 0 0 


,2837 


,0053 


,2808 


, 0084 


10 0 0 


,0748 


,0161 


,0239 


,0160 


0 10 0 


,0208 


,0188 


.0259 


,0160 


0 0 1 0 


,0052 


~ .0040 


,0259 


,0160 


0 0 0 1 


,0156 


.0128 


,0259 


,0160 


110 0 


,0055 


,0573 


,0024 


,0306 


10 10 


, 0014 


. 0123 


,0024 


,0806 


10 0 1 


. 0041 


.0391 


,0024 


' ,0306 


0 110 


,0004 


,0142 


,0024 


,0306 


0 1 0 1 


,0011 


.0454 


,0024 


,0306 


0 011 


.0003 


.0097 


.0024 


,0306 


1110 


,0001 


.0435 


,0002 


,0583 


110 1 


,0003 


.1387 


,0002 


.0583 


10 11 


,0001 


,0297 


,0002 


,0583 


0 111 


.0000 


.0345 


,0002 \ 


;0S83 


1111 


.0000 


.1053 


,0000 


,1314 




'20 



! the sane classification dGcisions aro reached as abovGj howevcsr the expGcted 
proportion of misclasaiftad exajninGOS is ,0876, 



REFERENCES 



D^rton^ C. M* & Macready^ G* B. A probabilistic model for v^idation 
of bGliaviorri hierarchlas* P^chometrika g 1976^ 189-204, 

-Harris^ Some technical ch^acteristics of mastery testa* 

In C* W* Harris^ M. C. Allcin & W, J* Popham (Eds,)^ fa-oblems in 
: Crilorion^Referenced Measurement , Los Angeles: Center for the ^ - 
Study of EvaluationV U.C^L.A;^ 1974^ 98-115. 

Maeready^ G. Be The structure of domain hierarchies foimd within a 
domain referenced testing system* Educational and Psychological 
Measi^oment ^ 1975^ 35 p 583-598* ~ ~ 

Macready^ G* B. ^ Da3rt Qn^ C* M* Th e use of probabilistic models in the 
assessment of mastery * JoumaT^fTiducatlQnal Statistics ^ 1977^ £^ 
(In press). . 

Maeready^ G* B.^ Merwln^ J. C# Homogeneity ^^ithin item forms in domain 
referenced testing. Educational and Psychological Measurement ^ 
1973 p 33^ 851^360. " ^ - ~ ~ ~~~ 

RaOp C« R. Linear statistical inference and its applications * New York i 
John Wiley 5 1965. 

Stbuffers -S. A. & Toty^ J* Role conffllct: and personcLLity. The American 

Journal of Sociology ^ Mm^ch^ 19 51 ^ 665 ^395-406* : 

Torgarson^ W* S. Theory^ and Methods of Scaling * John Wiley & Sons^ 
New Yorkp 1958* ~ " 




25 



